On Proxy Variables and Categorical Data Fusion

نویسنده

Li-Chun Zhang

چکیده

The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the uncertainty and techniques of categorical data fusion

Statistical matching (or data fusion) has long been used to merge separate data files in order to generate a joint fusion data set. Since the target joint data are not observable, it is recognized that, in addition to sampling variations that exist in the separate data files, there is an identification uncertainty associated with the assumptions that underpin the fusion procedure. In this paper...

متن کامل

مدل رگرسیون لجستیک چند حالته با مقادیر گم شده و کاربرد آن در بررسی بیماری گواتر

In large–scale sampling opeartions (e.g. nation-wide health surveys) we always face the problem of non-response item(s) and/or non-response unit(s). In fitting a model to the data we have two groups of variables, namely dependent and independent variables. Non-response may occur for any of these groups of variables. In this paper we assume Y as a categorical dependent variable with three levels...

متن کامل

Town trip forecasting based on data mining techniques

In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests f...

متن کامل

Fractured Reservoirs History Matching based on Proxy Model and Intelligent Optimization Algorithms

In this paper, a new robust approach based on Least Square Support Vector Machine (LSSVM) as a proxy model is used for an automatic fractured reservoir history matching. The proxy model is made to model the history match objective function (mismatch values) based on the history data of the field. This model is then used to minimize the objective function through Particle Swarm Optimization (...

متن کامل

Presenting a structural model to explain academic Burnout of medical sciences students based on thought action fusion, emotion control and imposter syndrome

Psychological variables in university environments which are diverse in terms of individual and personality differences increase student adaptability and affect their academic performance. The purpose of this study was to determine the relationship between the thought action fusion and emotional control with the symptoms of academic burnout in students through the mediation role of imposter syn...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

On Proxy Variables and Categorical Data Fusion

نویسنده

چکیده

منابع مشابه

On the uncertainty and techniques of categorical data fusion

مدل رگرسیون لجستیک چند حالته با مقادیر گم شده و کاربرد آن در بررسی بیماری گواتر

Town trip forecasting based on data mining techniques

Fractured Reservoirs History Matching based on Proxy Model and Intelligent Optimization Algorithms

Presenting a structural model to explain academic Burnout of medical sciences students based on thought action fusion, emotion control and imposter syndrome

عنوان ژورنال:

اشتراک گذاری